Using uncertainty to link and rank evidence from biomedical literature for model curation

نویسندگان

  • Chrysoula Zerva
  • Riza Batista-Navarro
  • Philip Day
  • Sophia Ananiadou
چکیده

Motivation In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models. Results We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research. Availability and implementation The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of the foundation, models and issues of research data curation and management in scientific and academic environments

Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...

متن کامل

Pseudo-Likelihood Inference Underestimates Model Uncertainty: Evidence from Bayesian Nearest Neighbours

When using the K-nearest neighbours (KNN) method, one often ignores the uncertainty in the choice of K. To account for such uncertainty, Bayesian KNN (BKNN) has been proposed and studied (Holmes and Adams 2002 Cucala et al. 2009). We present some evidence to show that the pseudo-likelihood approach for BKNN, even after being corrected by Cucala et al. (2009), still significantly underest...

متن کامل

On the Reachability of Trustworthy Information from Integrated Exploratory Biological Queries

Levels of curation across biological databases are widely recognized as being highly variable, depending on provenance and type. In spite of ambiguous quality, searches against biological sources, such as those for sequence homology, remain a frontline strategy for biomedical scientists studying molecular data. In the following, we investigate the accessibility of well-curated data retrieved fr...

متن کامل

Recommending MeSH terms for annotating biomedical articles

BACKGROUND Due to the high cost of manual curation of key aspects from the scientific literature, automated methods for assisting this process are greatly desired. Here, we report a novel approach to facilitate MeSH indexing, a challenging task of assigning MeSH terms to MEDLINE citations for their archiving and retrieval. METHODS Unlike previous methods for automatic MeSH term assignment, we...

متن کامل

The BioGRID interaction database: 2015 update

The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749,912 interactions as drawn from 43,149 publications that represent 30 model organisms. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2017